47 research outputs found

    Two-dimensional triangular mesh-based mosaicking for object tracking in the presence of occlusion

    Get PDF
    In this paper, we describe a method for temporal tracking of video objects in video clips. We employ a 2D triangular mesh to represent each video object, which allows us to describe the motion of the object by the displacements of the node points of the mesh, and to describe any intensity variations by the contrast and brightness parameters estimated for each node point. Using the temporal history of the node point locations, we continue tracking the nodes of the 2D mesh even when they become invisible because of self-occlusion or occlusion by another object. Uncovered parts of the object in the subsequent frames of the sequence are detected by means of an active contour which contains a novel shape preserving energy term. The proposed shape preserving energy term is found to be successful in tracking the boundary of an object in video sequences with complex backgrounds. By adding new nodes or updating the 2D triangular mesh we incrementally append the uncovered parts of the object detected during the tracking process to the one of the objects to generate a static mosaic of the object. Also, by texture mapping the covered pixels into the current frame of the video clip we can generate a dynamic mosaic of the object. The proposed mosaicing technique is more general than those reported in the literature because it allows for local motion and out-of-plane rotations of the object that results in self-occlusions. Experimental results demonstrate the successful tracking of the objects with deformable boundaries in the presence of occlusion

    Simultaneous 3-D motion estimation and wire-frame model adaptation including photometric effects for knowledge-based video coding

    Get PDF
    We address the problem of 3-D motion estimation in the context of knowledge-based coding of facial image sequences. The proposed method handles the global and local motion estimation and the adaptation of a generic wire-frame to a particular speaker simultaneously within an optical flow based framework including the photometric effects of motion. We use a flexible wire-frame model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wire-frame model. For the initialization of the motion and structure parameters, a modified feature based algorithm is used. Experimental results with simulated facial image sequences are given. © 1994 IEEE

    Multimodal speaker identification using an adaptive classifier cascade based on modality reliability

    Full text link

    Tracking motion and intensity variations using hierarchical 2-D mesh modeling for synthetic object transfiguration

    Get PDF
    We propose a method for tracking the motion and intensity variations of a 2-D mildly deformable image object using a hierarchical 2-D mesh model. The proposed method is applied to synthetic object transfiguration, namely, replacing an object in a real video clip with another synthetic or natural object via digital postprocessing. Successful transfiguration requires accurate tracking of both motion and intensity (contrast and brightness) variations of the object-to-be-replaced so that the replacement object can be rendered in exactly the same way from a single still picture. The proposed method is capable of tracking image regions corresponding to scene objects with nonplanar and/or mildly deforming surfaces, accounting for intensity variations, and is shown to be effective with real image sequences. © 1996 Academic Press, Inc

    Semantics of Multimedia in MPEG-7

    Get PDF
    In this paper, we present the tools standardized by MPEG-7 for describing the semantics of multimedia. In particular, we focus on the abstraction model, entities, attributes and relations of MPEG-7 semantic descriptions. MPEG-7 tools can describe the semantics of specific instances of multimedia such as one image or one video segment but can also generalize these descriptions either to multiple instances of multimedia or to a set of semantic descriptions. The key components of MPEG-7 semantic descriptions are semantic entities such as objects and events, attributes of these entities such as labels and properties, and, finally, relations of these entities such as an object being the patient of an event. The descriptive power and usability of these tools has been demonstrated in numerous experiments and applications, these make them key candidates to enable intelligent applications that deal with multimedia at human levels

    Quantum state-dependent diffusion and multiplicative noise: a microscopic approach

    Full text link
    The state-dependent diffusion, which concerns the Brownian motion of a particle in inhomogeneous media has been described phenomenologically in a number of ways. Based on a system-reservoir nonlinear coupling model we present a microscopic approach to quantum state-dependent diffusion and multiplicative noise in terms of a quantum Markovian Langevin description and an associated Fokker-Planck equation in position space in the overdamped limit. We examine the thermodynamic consistency and explore the possibility of observing a quantum current, a generic quantum effect, as a consequence of this state-dependent diffusion similar to one proposed by B\"{u}ttiker [Z. Phys. B {\bf 68}, 161 (1987)] in a classical context several years ago.Comment: To be published in Journal of Statistical Physics 28 pages, 3 figure

    Fast H.264/AVC video encoding with multiple frame references

    No full text
    IEEE International Conference on Image Processing 2005, ICIP 2005 -- 11 September 2005 through 14 September 2005 -- Genova -- 68264We focus on the question of how we can select the best multiple reference pictures for enhanced H.264 video encoding by a fast, computationally efficient method. We propose a simple histogram-similarity based method for selecting the best set of multiple reference pictures. Out-of-order coding of these frames is implemented by means of pyramid encoding. Experimental results show that the proposed approach can provide encoding time saving up to 23% with similar picture quality and bitrate for selected video sequences. © 2005 IEEE

    H.264 encoding of videos with large number of shot transitions using long-term reference pictures

    No full text
    ISL Altran;Galileo Avionics;Selex Sistemi Intergrati;STMicroelectronics;University of Pisa14th European Signal Processing Conference, EUSIPCO 2006 -- 4 September 2006 through 8 September 2006 -- Florence -- 90688Long-term reference prediction is an important feature of the H.264/AVC standard, which provides a trade-off between gain and complexity. A simple long-term reference selection method is presented for videos with frequent shot/view transitions in order to optimize compression efficiency at the shot boundaries. Experimental results show up to 50% reduction in the number of bits, at the same PSNR, for frames at the border of transitions

    Unequal inter-view rate allocation using scalable stereo video coding and an objective stereo video quality measure

    No full text
    The Institute of Electrical and Electronics Engineers;Circuits and Systems Society;Communications Society;Computer Society;Signal Processing Society2008 IEEE International Conference on Multimedia and Expo, ICME 2008 -- 23 June 2008 through 26 June 2008 -- Hannover -- 73814In stereoscopic 3D video, it is well-known that humans can perceive high quality 3D video provided that one of the views is in high quality. Hence, in stereo video encoding, the best overall rate vs. perceived-distortion performance may be achieved by reduction of the spatial, temporal, and/or quantization resolution of the second view, while keeping the first view in full resolution. In this paper, we address the best selection of unequal inter-view rate allocation strategy depending on the content of the video for a scalable multi-view video codec (SMVC)[1]. Since the perceived 3D video quality does not correlate well with the average PSNR of the two views, we propose a new quantitative measure using a weighted combination of two PSNR values and a jerkiness measure. We verified that unequal rate allocation between the left and right views results in better perceived stereo video quality. © 2008 IEEE
    corecore